GE in TREC-2: Results of a Boolean Approximation Method for Routing and Retrieval
نویسنده
چکیده
As before, there is a tremendous variation in the topic-by-topic results, suggesting that a great deal more research is needed to nd how to get the best results in dierent routing and retrieval scenarios. We are encouraged by the progress of our system, as well as of the overall eld, in these experiments, and are hopeful that in the coming years we will learn how to combine our promising results in corpus analysis with the more mature ranking and retrieval models of some of the other systems. References [1] Paul S. Jacobs. Using statistical methods to improve knowledge-based news method for query construction and topic assignment in trec. 8 This worked surprisingly well in cases where the topic statements were the topic title was a good description (e.g. \Welfare Reform") and absolutely horribly for those with vague titles (e.g. \Find Innovative Companies". We tried to recover from these by including more words from the description and narrative, but then we had to start recognizing the language of these descriptions, ltering out words like \relevant", \mention" and so forth. The fully automatic ad hoc system certainly didn't do as well as the manual routing system, but it was still at or above median for more than half of the ad hoc topics. Considering that this method could be used within the context of most any legacy retrieval system, the result is worth noting. Furthermore, the generation of Boolean queries from natural language descriptions is an interesting , as well as practical, research problem, because many dierent retrieval systems can make some use of Boolean queries. 2.3 Ranking Our document ranking method will be more fully described in the proceedings paper. In both routing and ad hoc, we used a set of word weights, acquired from the relevance judgements in the routing case and from the corpus data in the ad hoc case. We combined the weighted frequency of these terms with an overall count of the number of topic hits per document, normalizing for document length, to produce a score for each document. This was the result of trying many dierent approaches on the test data, so it was denitely a good method for our system. However, in comparing our results with those of other systems, our precision curve across various recall points is not nearly as good as a system that does really good ranking. In routing, we are not …
منابع مشابه
COMBINING THE EVIDENCE OF MULTIPLE QUERY REPRESENTATIONS FOR INFORMATION RETRIEVAL l N.J. BELKIN and P. KANTOR
We report on two studies in the TREC-2 program that investigated the effect on retrieval performance of combination of multiple representations of TREC topics. In one of the projects, five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both ...
متن کاملExperiments on Routing, Filtering and Chinese Text Retrieval in TREC-5
We describes our experiments in the routing, ltering and Chinese text retrieval. We based our routing and ltering experiments on our discriminant project algorithm. The algorithm sequentially constructs a series of orthogonal axis from the training documents using the Gram-Schmidt procedure. It then rotates the resulting subspace using principal component analysis so that the axis are ordered b...
متن کاملCombining the Evidence of Multiple Query Representations for Information Retrieval
We report on two studies in the TREC-2 program which investigated the effect on retrieval performance of combination of multiple representations of TREC topics. In one of the projects, five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both...
متن کاملTerm importance, Boolean conjunct training, negative terms, and foreign language retrieval: probabilistic algorithms at TREC-5
The Berkeley experiments for TREC-5 extend those of TREC-4 in numerous ways. For routing retrieval we experimented with the idea of term importance in three ways -training on Boolean conjuncts of the most important terms, filtering with the most important terms, and, finally, logistic regression on presence or absence of those terms. For ad-hoc retrieval we retained the manual reformulations of...
متن کاملTREC-3 Ad-Hoc, Routing Retrieval and Thresholding Experiments using PIRCS
The PIRCS retrieval system has been upgraded in TREC-3 to handle the full English collections of 2 GB in an efficient manner. For ad-hoc retrieval, we use recurrent spreading of activation in our network to implement query learning and expansion based on the best-ranked subdocuments of an initial retrieval. We also augment our standard retrieval algorithm with a soft-Boolean component. For rout...
متن کامل